The Hinoki Sensebank - A Large-Scale Word Sense Tagged Corpus Of Japanese
نویسندگان
چکیده
While there has been considerable research on both structural annotation (such as the Penn Treebank (Taylor et al., 2003) or the Kyoto Corpus (Kurohashi and Nagao, 2003)) and semantic annotation (e.g. Senseval: Kilgariff and Rosenzweig, 2000; Shirai, 2002), there are almost no corpora that combine both. This makes it difficult to carry out research on the interaction between syntax and semantics. Projects such as the Penn Propbank are adding structural semantics (i.e. predicate argument structure) to syntactically annotated corpora, but not lexical semantic information (i.e. word senses). Other corpora, such as the English Redwoods Corpus (Oepen et al., 2002), combine both syntactic and structural semantics in a monostratal representation, but still have no lexical semantics. In this paper we discuss the (lexical) semantic annotation for the Hinoki Corpus, which is part of a larger project in psycho-linguistic and computational linguistics ultimately aimed at language understanding (Bond et al., 2004).
منابع مشابه
Design and Prototype of a Large-Scale and Fully Sense-Tagged Corpus
Sense tagged corpus plays a very crucial role to Natural Language Processing, especially on the research of word sense disambiguation and natural language understanding. Having a large-scale Chinese sense tagged corpus seems to be very essential, but in fact, such large-scale corpus is the critical deficiency at the current stage. This paper is aimed to design a large-scale Chinese full text se...
متن کاملConstruction of a Word Sense Tagged Corpus for SENSEVAL-2 Japanese Dictionary Task
This paper reports the details of a Japanese word sense tagged corpus developed as an evaluation data for SENSEVAL-2 Japanese dictionary task. The corpus made up of 2,130 newspaper articles. Not all but only 10,000 words in the articles were manually annotated with sense IDs, which was used as a gold standard data. Word senses were deÞned according to the Iwanami Kokugo Jiten, a Japanese dictio...
متن کاملGetting Serious About Word Sense Disambiguation
Recent advances in large-scale, broad coverage part-of-speech tagging and syntactic parsing have been achieved in no small part due to the availability of large amounts of online, human-annotated corpora. In this paper, I argue that a large, human sensetagged corpus is also critical as well as necessary to achieve broad coverage, high accuracy word sense disambiguation, where the sense distinct...
متن کاملWord Sense Disambiguation Using Heterogeneous Language Resources
This paper proposes a robust method for word sense disambiguation of Japanese. We combined several classifiers using heterogeneous language resources, a machine readable dictionary and a word sense tagged corpus. According to our experimental results, our method outperformed the best single classifier for recall and applicability.
متن کاملKorean Word-Sense Disambiguation Using Parallel Corpus as Additional Resource
Most previous research on Korean WordSense Disambiguation (WSD) were focusing on unsupervised corpus-based or knowledge-based approach because they suffered from lack of sense-tagged Korean corpora.Recently, along with great effort of constructing sense-tagged Korean corpus by government and researchers, finding appropriate features for supervised learning approach and improving its prediction ...
متن کامل